Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 277

You can notice that the overall quality scores of the reads are high but there are also

some reads with quality score less than 20 (99% accuracy) toward the end of the reads.

We can trim the low-quality bases from the end of the reads. The demultiplexed sequence

length summary at the bottom of the Interactive Quality Plot tab shows that the reads have

equal length (275 base). This table will help us to determine if we need to make the length

of reads equal or not.

If you decide to filter out the reads with poor quality scores, you can use the “qual-

ity-filter” plugin with “q-score” method. However, this can be done for single-end reads.

For paired-end reads, you can join forward and reverse reads and then run “quality-fil-

ter q-score” on the merged reads. This will be discussed later with clustering. However,

denoising methods also have their way to filter low-quality reads as we will see soon. But,

if your data is single-end reads, you can use “quality-filter q-score” to remove low-quality

reads using the following script:

qiime quality-filter q-score \

--i-demux demux.qza \

--p-min-quality 20 \

--p-quality-window 5 \

--p-min-length-fraction 0.8 \

--p-max-ambiguous 0 \

--o-filtered-sequences demux-filtered.qza \

--o-filter-stats demux-filter-stats.qza

The default settings are “--p-min-quality 4”, “--p-quality-window 3”, “--p-min-length-frac-

tion 0.75”, and “--p-max-ambiguous 0”, if those parameters are not included in the above.

For more information about these parameters, use “qiime quality-filter q-score --help”. We

will discuss this in more detail with clustering.

If there are PCR primer sequences or any other non-biological sequences, you can

remove them using “cutadapt” plugin. Once more, this is not applicable to our yoga data

but, just in case, if you had sequences with primers at this stage of the analysis, it would

FIGURE 7.8 Per base quality plots of the yoga data.